Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switchable buffering for Stdout #78515

Closed

Conversation

Lucretiel
Copy link
Contributor

@Lucretiel Lucretiel commented Oct 29, 2020

Summary

This PR adds "switchable" buffering to Stdout. Instead of being unconditionally line-buffered, stdout can now be put in one of 3 modes: line buffering, which is the current behavior; block buffering, which mirrors BufWriter; and unbuffered, which (with one small exception, see Design Notes) forwards all writes directly to the underlying device.

This PR additionally contains some proposed implementation updates to BufWriter, with a focus on reducing total device write calls in the average case. These updates are based on lessons learned from the design of LineWriterShim, as well as some discussions with @workingjubilee on this topic. This could be considered "out of scope" for this PR, so at the discretion of the reviewers I'm happy to move it to a separate PR, but I was "in the neighborhood", so to speak. See Design Notes for details about these changes. Moved to #78551

This PR moves us another step closer to #60673, which calls for stdout to use block buffering when appropriate (ie, when stdout is a not a tty). This PR does not change any current startup behavior; stdout is still unconditionally line buffered by default. Instead, this PR creates only an implementation of switchable buffering (which lives in the SwitchWriter type), which future PRs (probably individual ones for different platforms) can use to select an appropriate startup default for their launch environment.

Side note on terminology

Throughout this proposal I'll be referring to writers flushing / buffers being flushed; unless I specify otherwise, I always mean that the data is being flushed one level, to the wrapped io::Write writer. "Full" flushing (ie, forwarding all the way though the stack to the underlying sink) is assumed to only happen on an explicit user call to flush.

Design notes

SwitchWriter

The core of this PR is the SwitchWriter type. It's fairly simple; it has a current mode, and delegates all write operations to its BufWriter as follows:

  • In block buffered mode, it forwards directly to the BufWriter.
  • In line buffered mode, it creates a LineWriterShim (a wrapper around &mut BufWriter, implementing line buffering logic) and forwards to it. Treatment of existing buffered data is entirely deferred to LineWriterShim.
  • In unbuffered mode, it ensures the BufWriter is flushed, then forwards directly to the io::Write contained by the BufWriter.
    • A small exception: because write_fmt tends to be a lot of very short writes, it is buffered into the BufWriter and immediately flushed. The write still completes immediately, but we assume in practice that a caller of write_fmt on an unbuffered SwitchWriter doesn't care if every individual write that entails is forwarded directly to the writer, as much as they care that the write is forwarded immediately.

Changing the buffering mode never performs any synchronous i/o operations; it only changes the behavior of subsequent writes. The treatment of buffered content during a mode shift is left deliberately unspecified; in practice it's treated by whatever the behavior of the current mode is. For instance, when switching from block buffering to line buffering, any previously buffered content will (usually) remain buffered until a new line is written by the user. SwitchWriter does not retroactively scan the buffer for newlines to flush after such a mode switch. Conversely, it's possible (though unlikely) for partial writes from the underlying device to cause lines in line buffering mode to be buffered without being flushed. LineWriter is designed to retry these writes ASAP (on subsequent write calls), but switching to block buffered mode will cause this content to remain buffered. It is assumed that, in cases where precision is needed, the user would flush before switching modes.

Strictly speaking, it isn't necessary for the fulfillment of #60673 for stdout to be switchable, only that the mode is set based on some environmental factors at startup. In practice, however, this means that the implementation needs to be able to switch over the current mode, so it's a trivial addition to make it further switchable at runtime. This has the side benefit of making it easy to break #60673 into different parts; without this public interface, the PR fails the "unused code" checks, and would require an implementation of isatty to be added immediately. Because this new API is unstable, I figure we can always remove it after #60673 is complete if we find it isn't justifying its inclusion.

stdout interface changes

This change adds buffer_mode and set_buffer_mode to StdoutLock, which cause the buffering mode to be globally switched. There's not much else to talk about here. Currently SwitchWriter is not exported publicly (though it could be someday), so this API is the only way right now for any of this behavior to be accessed.

Follow up items

The functional aspects of this PR are complete, but there are still several open tasks that need to be completed. Some are code related (write comprehensive tests & docs), which others are publishing related (this PR adds new public methods to StdoutLock, which need to be discussed and approved).

Tasks

  • SwitchWriter tests
  • stdout switching tests
    • Is stdout tested specifically as a write target? I see tests for panic consistency / recovery, and extensive tests for BufWriter, LineWriter, etc., but didn't immediately see tests for "writing to stdout".
  • Comprehensive documentation of new public API
  • Comprehensive documentation of SwitchWriter; even though it's not exported publicly, we still want it to be documented for developers reading it.

Open questions

While I've made assumptions about these, and described my rationale as much as possible above, I wanted to call out the specific things that I wanted to be sure are addressed in review.

  • Should stdout be made runtime switchable by standard library clients at all?
    • Emphatically yes.
    • If so, approve my naming scheme, or come up with a preferable one.
    • if so, do we like the trivial get/set interface, or is another design warrented?
    • Should the buffer switching interface be available on Stdout, as well as StdoutLock?
  • Do we like these buffer mode choices? I was torn about whether to expose unbuffered, or just stick to Block / Line buffering. We need an unbuffered mode internally (for stdout cleanup at program shutdown) but we don't necessarily have to expose it.

Next steps

Assuming this PR gets merged (or, at least, the SwitchWriter part), the only remaining step is to actually create an interface for the platform to report its preferred stdout configuration based on the startup environment. This would be nearly identical to the way stdout_raw or is_ebadf are handled today: a standard interface is implemented in the various sys crates and consumed via a uniform interface in io. My envisioned interface looks like:

/// Get the stdout buffering mode based on the startup
/// environment. For instance, on unix platforms, this will
/// enable line-buffering if stdout is a tty and block-buffering
/// otherwise. Also returns the default buffer size.
fn get_default_stdout_mode() -> (BufferMode, usize) { ... }

This will, of course, be discussed in more detail in that future PR.

@rust-highfive
Copy link
Collaborator

r? @dtolnay

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 29, 2020
@jyn514 jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` labels Oct 29, 2020
Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Some first comments:

library/std/src/io/buffered/switchwriter.rs Outdated Show resolved Hide resolved
library/std/src/io/buffered/switchwriter.rs Show resolved Hide resolved
library/std/src/io/stdio.rs Show resolved Hide resolved
library/std/src/io/buffered/switchwriter.rs Show resolved Hide resolved
library/std/src/io/buffered/switchwriter.rs Show resolved Hide resolved
@m-ou-se
Copy link
Member

m-ou-se commented Oct 29, 2020

This PR additionally contains some proposed implementation updates to BufWriter

If you can easily separate those, having that in a separate PR would be nice.

Should stdout be made runtime switchable by standard library clients at all?

I'd say so, yes, absolutely. program | grep something can get pretty annoying if program doesn't flush every line. Many programs have --line-buffered as an option, and Rust programs should be able to do that too. I don't think any automatic (isatty-based) block buffering of stdout can be enabled without providing a stable way to switch to line buffered mode.

@workingjubilee
Copy link
Member

A lot can happen with streams. A pipe can be opened, redirected, closed, et cetera. The ability to respond to such a dynamic environment is a must. We should offer an interface morally equivalent to https://linux.die.net/man/3/setvbuf (which is a C standard, so it should apply cross-OS), while also being mindful that an author may want to manage their interactions in response to user-space commands like https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html or similar.

And if possible we would like to write to terminals using line buffering by default and write to non-terminals using block buffering by default, following the convention of most environments. That doesn't have to go in this PR, but we may wish to open a follow up issue.

I am given to understand that Microsoft Windows has interesting nuances to be aware of that don't mesh with a Unix-dominant view of how things should go, in particular in its cmd.exe prompt and its NTFS and ReFS file systems, so it would be a good idea to eventually collect input from someone who was well-versed in those before stabilizing this, presuming it lands. Though I am also given to understand the recent Windows Terminal more closely reflects interfaces in other OS: https://github.com/Microsoft/Terminal

@Lucretiel
Copy link
Contributor Author

Lucretiel commented Oct 29, 2020

I am given to understand that Microsoft Windows has interesting nuances to be aware of that don't mesh with a Unix-dominant view of how things should go, in particular in its cmd.exe prompt and its NTFS and ReFS file systems, so it would be a good idea to eventually collect input from someone who was well-versed in those before stabilizing this, presuming it lands.

Yeah, for sure; this is why I wanted to get the "switchable" behavior approved & merged before setting out to actually start picking default behaviors. This way we can add platform-specific logic in separate PRs, each with feedback from experts in the respective platforms.

@Lucretiel
Copy link
Contributor Author

If you can easily separate those, having that in a separate PR would be nice.

Done.

@Lucretiel
Copy link
Contributor Author

Though I am also given to understand the recent Windows Terminal more closely reflects interfaces in other OS:

As it happens, I actually have some indirect experience here; we encountered some odd behavior where console system calls were failing in alacrity and Windows Terminal but not in cmd. We were never able to identify a root cause, since the relevant system call doesn't exhaustively document the conditions under which it fails, but my best guess right now is it has to do with the Alacritty and Windows Terminal interacting with ANSI command codes in a way that cmd does not:

@m-ou-se
Copy link
Member

m-ou-se commented Oct 30, 2020

I'm wondering if it might make sense to not add a new type (SwitchWriter), but instead extend BufWriter. Thanks to the panicked field it has, that struct already has 3 or 7 bytes of padding. So adding a flush_lines: bool wouldn't make it any bigger. Checking that boolean to use a LineWriterShim in its methods is probably very cheap.

(And BufWriter::with_capacity(0, stream) already implements the unbuffered option.)

(It'd also save some effort in having to come up with a clear name for SwitchWriter. ^^')

@Lucretiel
Copy link
Contributor Author

I agree that it'd be nice to have the space savings, but I disagree that this functionality should be collapsed into BufWriter:

  • Users of BufWriter today always want its buffered logic; we shouldn't introduce the conditional logic into those methods.
  • LineWriterShim is entirely definied in terms of the buffering logic primitives (write_to_buf, flush_buf) provided by BufWriter. Giving BufWriter a switchable mode would circularize the dependency chain there.

@crlf0710 crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2020
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 7, 2020
@bors
Copy link
Contributor

bors commented Dec 11, 2020

☔ The latest upstream changes (presumably #77801) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 11, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 1, 2021
@Dylan-DPC-zz
Copy link

@Lucretiel any updates on this?

@Lucretiel Lucretiel closed this Feb 19, 2021
@Dylan-DPC
Copy link
Member

@Lucretiel you'll need another rebase ;)

@the8472
Copy link
Member

the8472 commented Feb 19, 2022

And after rebasing you should do (at)rustbot ready to update the status so it has a chance to get a review before it bitrots again.

@Lucretiel
Copy link
Contributor Author

And after rebasing you should do (at)rustbot ready to update the status so it has a chance to get a review before it bitrots again.

Will do, thank you for the advice

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 20, 2022
@Dylan-DPC
Copy link
Member

@Lucretiel any updates?

@Lucretiel
Copy link
Contributor Author

Sorry for more delays. Working on it tonight

@Lucretiel Lucretiel force-pushed the stdout-switchable-buffering branch from dc7c342 to 1e564bc Compare May 12, 2022 03:43
@Lucretiel Lucretiel force-pushed the stdout-switchable-buffering branch from 1e564bc to a69dca8 Compare May 12, 2022 04:23
@Lucretiel
Copy link
Contributor Author

Rebase finished; conflicts resolved.

@Lucretiel
Copy link
Contributor Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 12, 2022
@bors
Copy link
Contributor

bors commented Jun 19, 2022

☔ The latest upstream changes (presumably #98242) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se m-ou-se added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 20, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Jun 20, 2022

This is still stuck on an API design question: #78515 (comment)

Nominated for libs-api discussion.

@m-ou-se
Copy link
Member

m-ou-se commented Jun 28, 2022

We've discussed this pull request in the recent libs-api meeting.

This PR seems to be stuck because it's too broad, making it hard to focus on a single decision. It would probably be a lot easier to make progress here if it was split into parts that each can be decided separately.

Here's a few thoughts that came up in the meeting:

  • We're generally in favor of an API to allow dynamically setting block buffering or unbuffered mode for stdout.
  • The API for setting the buffering mode of stdout can be proposed separately from the SwitchWriter API, to avoid mixing those discussions.
  • While dynamically switching between buffer modes at runtime seems reasonable for stdout, it seems like a rather niche problem that's specific to stdout. This makes it questionable whether it's worth it to add a SwitchWriter type for this problem. The usefullness of Stdout::set_buffer_mode doesn't necessarily mean that something like SwitchWriter is as useful too.
  • Merging the SwitchWriter behavior into BufWriter doesn't seem great, but it might be a good idea to merge it into LineWriter: LineWriter could have a switch to turn line-buffering on and off. It already has to scan for \n on every write. Checking an extra boolean before every write should be very little overhead.
  • BufWriter could have a method to change its capacity. Then that can be used to allow switching to 'immediate'/unbuffered mode, by simply changing the buffer size to zero.
  • The buffer size is not only relevant for block buffering mode, but also for line buffering mode.

To make progress on this, it's probably best to close this PR for now and first get consensus on the API surface, before discussing any implementation. It's not clear whether set_buffer_mode should use an enum, or separate arguments for setting the buffer size and switching line buffered mode, or separate methods for the different modes, or something more flexible/extensible.

See https://std-dev-guide.rust-lang.org/feature-lifecycle/api-change-proposals.html for how to propose an unstable API, focussing on just the interface without the implementation details.

It's advisable to propose set_buffer_mode separately from SwitchWriter or any changes to LineWriter.

@m-ou-se m-ou-se removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 28, 2022
@the8472 the8472 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 19, 2022
@Dylan-DPC
Copy link
Member

@Lucretiel any updates on this?

@dtolnay
Copy link
Member

dtolnay commented Aug 26, 2022

#78515 (comment) has guidance on how to make progress on this use case, including a suggestion that "it's probably best to close this PR". I'll do that now since it isn't something we would be able to accept in this form, but the library API team still has a desire to address the use case that motivated this PR, and we'd be happy to see that ACP and the rest of the design conveyed in Mara's comment.

@dtolnay dtolnay closed this Aug 26, 2022
@Lucretiel
Copy link
Contributor Author

While dynamically switching between buffer modes at runtime seems reasonable for stdout, it seems like a rather niche problem that's specific to stdout. This makes it questionable whether it's worth it to add a SwitchWriter type for this problem.

  • Building the switchable functionality directly into stdout just ends up looking exactly the same; I wasn't really perceiving much of an abstraction penalty, especially given that stdout is already a nested stack of abstractions over the sys::RawStdout writer.
  • It occurred to me that there's a simpler design where the mode isn't switchable after initial detection, but I was presuming that, even if we don't add it immediately, we're pretty quickly going to want to add the ability for standard library users to switch the stdout mode away from the automatically detected one
  • I didn't intend that SwitchWriter would be become a public API type anytime soon

Merging the SwitchWriter behavior into BufWriter doesn't seem great, but it might be a good idea to merge it into LineWriter: LineWriter could have a switch to turn line-buffering on and off. It already has to scan for \n on every write. Checking an extra boolean before every write should be very little overhead.

I remain staunchly opposed to this for 0-cost abstraction reasons but I'm happy to be overruled.

See https://std-dev-guide.rust-lang.org/feature-lifecycle/api-change-proposals.html for how to propose an unstable API, focussing on just the interface without the implementation details.

Noted, I'll work on a more formal proposal

It's advisable to propose set_buffer_mode separately from SwitchWriter or any changes to LineWriter.

For sure; my intention with this PR as written was that set_buffer_mode was in fact the only publicly visible API change; everything else was implementation.

@SUPERCILEX
Copy link
Contributor

@Lucretiel I created a vaguely related proposal: rust-lang/libs-team#148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.